Goto

Collaborating Authors

 authorship verification



Residualized Similarity for Faithfully Explainable Authorship Verification

Zeng, Peter, Alipoormolabashi, Pegah, Mun, Jihu, Dey, Gourab, Soni, Nikita, Balasubramanian, Niranjan, Rambow, Owen, Schwartz, H.

arXiv.org Artificial Intelligence

Responsible use of Authorship Verification (AV) systems not only requires high accuracy but also interpretable solutions. More importantly, for systems to be used to make decisions with real-world consequences requires the model's prediction to be explainable using interpretable features that can be traced to the original texts. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully -- if there is an explanation given for a prediction, it doesn't represent the reasoning process behind the model's prediction. In this paper, we introduce Residualized Similarity (RS), a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability. Authorship verification is fundamentally a similarity task, where the goal is to measure how alike two documents are. The key idea is to use the neural network to predict a similarity residual, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.


How Well Do LLMs Imitate Human Writing Style?

Jemama, Rebira, Kumar, Rajesh

arXiv.org Artificial Intelligence

Large language models (LLMs) can generate fluent text, but their ability to replicate the distinctive style of a specific human author remains unclear. We present a fast, training-free framework for authorship verification and style imitation analysis. The method integrates TF-IDF character n-grams with transformer embeddings and classifies text pairs through empirical distance distributions, eliminating the need for supervised training or threshold tuning. It achieves 97.5\% accuracy on academic essays and 94.5\% in cross-domain evaluation, while reducing training time by 91.8\% and memory usage by 59\% relative to parameter-based baselines. Using this framework, we evaluate five LLMs from three separate families (Llama, Qwen, Mixtral) across four prompting strategies - zero-shot, one-shot, few-shot, and text completion. Results show that the prompting strategy has a more substantial influence on style fidelity than model size: few-shot prompting yields up to 23.5x higher style-matching accuracy than zero-shot, and completion prompting reaches 99.9\% agreement with the original author's style. Crucially, high-fidelity imitation does not imply human-like unpredictability - human essays average a perplexity of 29.5, whereas matched LLM outputs average only 15.2. These findings demonstrate that stylistic fidelity and statistical detectability are separable, establishing a reproducible basis for future work in authorship modeling, detection, and identity-conditioned generation.


Human-AI Collaboration or Academic Misconduct? Measuring AI Use in Student Writing Through Stylometric Evidence

Oliveira, Eduardo Araujo, Mohoni, Madhavi, López-Pernas, Sonsoles, Saqr, Mohammed

arXiv.org Artificial Intelligence

Human - Artificial Intelligence (HAI) collaboration in writing offers opportunities to enhance efficiency and boost student confidence; however, it also carries risks, such as reduced creativity, over - reliance on AI - generated content, and academic integrity (Kim & Lee, 2023) . While the ethical use of AI in education is widely acknowledged as a way to enhance student learning (Cotton et al., 2023; Foltynek et al., 2023), the rise of Unauthorised Content Generation (UCG) presents a significant challenge to academic misconduct. Measuring the extent and nature of HAI collaboration in academic contexts remains a critical challenge for educators, particularly as generative AI (genAI) tools become increasingly available and integrated into educational settings (Atchley et al., 2024; E. Oliveira et al., 2023) . Distinguishing AI - generated text from human - authored content is necessary for understanding student learning behaviours, supporting skill development, and maintaining academic integrity. Analysing student writing patterns can help educators evaluate how st udents engage with AI tools, track their writing skill progression, and identify areas where additional support is needed (Pan et al., 2025). Existing detection tools for AI - assisted misconduct often lack reliability, explainability, and resilience to circ umvention strategies such as paraphrasing (Cotton et al., 2023) . These challenges highlight the need for innovative, transparent, and robust approaches to address the unacknowledged use of genAI in HAI collaboration within academic writing (Kasneci et al., 2023) .



Trends and Challenges in Authorship Analysis: A Review of ML, DL, and LLM Approaches

Habib, Nudrat, Adewumi, Tosin, Liwicki, Marcus, Barney, Elisa

arXiv.org Artificial Intelligence

Authorship analysis plays an important role in diverse domains, including forensic linguistics, academia, cybersecurity, and digital content authentication. This paper presents a systematic literature review on two key sub-tasks of authorship analysis; Author Attribution and Author Verification. The review explores SOTA methodologies, ranging from traditional ML approaches to DL models and LLMs, highlighting their evolution, strengths, and limitations, based on studies conducted from 2015 to 2024. Key contributions include a comprehensive analysis of methods, techniques, their corresponding feature extraction techniques, datasets used, and emerging challenges in authorship analysis. The study highlights critical research gaps, particularly in low-resource language processing, multilingual adaptation, cross-domain generalization, and AI-generated text detection. This review aims to help researchers by giving an overview of the latest trends and challenges in authorship analysis. It also points out possible areas for future study. The goal is to support the development of better, more reliable, and accurate authorship analysis system in diverse textual domain.


Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification

Alperin, Kenneth, Leekha, Rohan, Uchendu, Adaku, Nguyen, Trang, Medarametla, Srilakshmi, Capote, Carlos Levya, Aycock, Seth, Dagli, Charlie

arXiv.org Artificial Intelligence

The increasing use of Artificial Intelligence (AI) technologies, such as Large Language Models (LLMs) has led to nontrivial improvements in various tasks, including accurate authorship identification of documents. However, while LLMs improve such defense techniques, they also simultaneously provide a vehicle for malicious actors to launch new attack vectors. To combat this security risk, we evaluate the adversarial robustness of authorship models (specifically an authorship verification model) to potent LLM-based attacks. These attacks include untargeted methods - \textit{authorship obfuscation} and targeted methods - \textit{authorship impersonation}. For both attacks, the objective is to mask or mimic the writing style of an author while preserving the original texts' semantics, respectively. Thus, we perturb an accurate authorship verification model, and achieve maximum attack success rates of 92\% and 78\% for both obfuscation and impersonation attacks, respectively.


Sui Generis: Large Language Models for Authorship Attribution and Verification in Latin

Schmidt, Gleb, Gorovaia, Svetlana, Yamshchikov, Ivan P.

arXiv.org Artificial Intelligence

This paper evaluates the performance of Large Language Models (LLMs) in authorship attribution and authorship verification tasks for Latin texts of the Patristic Era. The study showcases that LLMs can be robust in zero-shot authorship verification even on short texts without sophisticated feature engineering. Yet, the models can also be easily "mislead" by semantics. The experiments also demonstrate that steering the model's authorship analysis and decision-making is challenging, unlike what is reported in the studies dealing with high-resource modern languages. Although LLMs prove to be able to beat, under certain circumstances, the traditional baselines, obtaining a nuanced and truly explainable decision requires at best a lot of experimentation.


InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

Hu, Yujia, Hu, Zhiqiang, Seah, Chun-Wei, Lee, Roy Ka-Wei

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable proficiency in a wide range of NLP tasks. However, when it comes to authorship verification (AV) tasks, which involve determining whether two given texts share the same authorship, even advanced models like ChatGPT exhibit notable limitations. This paper introduces a novel approach, termed InstructAV, for authorship verification. This approach utilizes LLMs in conjunction with a parameter-efficient fine-tuning (PEFT) method to simultaneously improve accuracy and explainability. The distinctiveness of InstructAV lies in its ability to align classification decisions with transparent and understandable explanations, representing a significant progression in the field of authorship verification. Through comprehensive experiments conducted across various datasets, InstructAV demonstrates its state-of-the-art performance on the AV task, offering high classification accuracy coupled with enhanced explanation reliability.


Authorship Verification based on the Likelihood Ratio of Grammar Models

Nini, Andrea, Halvani, Oren, Graner, Lukas, Gherardi, Valerio, Ishihara, Shunichi

arXiv.org Artificial Intelligence

Authorship Verification (AV) is the process of analyzing a set of documents to determine whether they were written by a specific author. This problem often arises in forensic scenarios, e.g., in cases where the documents in question constitute evidence for a crime. Existing state-of-the-art AV methods use computational solutions that are not supported by a plausible scientific explanation for their functioning and that are often difficult for analysts to interpret. To address this, we propose a method relying on calculating a quantity we call $\lambda_G$ (LambdaG): the ratio between the likelihood of a document given a model of the Grammar for the candidate author and the likelihood of the same document given a model of the Grammar for a reference population. These Grammar Models are estimated using $n$-gram language models that are trained solely on grammatical features. Despite not needing large amounts of data for training, LambdaG still outperforms other established AV methods with higher computational complexity, including a fine-tuned Siamese Transformer network. Our empirical evaluation based on four baseline methods applied to twelve datasets shows that LambdaG leads to better results in terms of both accuracy and AUC in eleven cases and in all twelve cases if considering only topic-agnostic methods. The algorithm is also highly robust to important variations in the genre of the reference population in many cross-genre comparisons. In addition to these properties, we demonstrate how LambdaG is easier to interpret than the current state-of-the-art. We argue that the advantage of LambdaG over other methods is due to fact that it is compatible with Cognitive Linguistic theories of language processing.